Summary
3FF69J_2
2025-12-18
- This is a circos plot generated by Bakta (v1.11).
- It is a circular representation of all of the contigs, and contains information such as GC skew and feature location. See Bakta’s explanation for more detail.
Statistics
ONT QC stats
| Metric | Value |
|---|---|
| Total bp sequenced | 1,332,563,138 bp |
| Total number of reads | 186,416 reads |
| Longest read | 79,313 bp |
| Raw coverage | 352x |
Assembly stats
| Contig(s) | Length (bp) | Genome size (Mb) | Topology | Species | Coverage (x) | Number of genes annotated |
|---|---|---|---|---|---|---|
| genome | 3775517 bp | 3.78 Mb | linear | Bdellovibrio bacteriovorus | 334.68x | 3601 genes |
| contig_1 | 2796631 bp | 2.80 Mb | linear | Bdellovibrio bacteriovorus | 336.87x | |
| contig_2 | 978107 bp | 0.98 Mb | linear | Bdellovibrio bacteriovorus | 328.68x | |
| contig_3 | 779 bp | 0.00 Mb | linear | 12.84x |
Results
Contig analysis
- This is an assembly graph generated by Bandage (v0.8.1).
- It shows the contigs and their connections in the assembly.
- Generally, the best graph you can have for a bacterial genome is a single circle (and possibly some other circles if you have plasmids).
- If a contig is black, it has a relative depth of x1.
- If a contig is red it has a higher relative depth, suggesting it has higher abundance in the sample.
- Note: This is a graph of the pre-polished assembly.
Species analysis
Sourmash
- Sourmash (v4.9.4) used for fast, k-mer–based genome content identification by comparing assemblies against a custom database that includes GTDB rs226, RefSeq plasmids, and phage sequences.
| Contig(s) | species | genus | family | order | class | phylum | superkingdom |
|---|---|---|---|---|---|---|---|
| genome | Bdellovibrio bacteriovorus | Bdellovibrio | Bdellovibrionaceae | Bdellovibrionales | Bdellovibrionia | Bdellovibrionota | Bacteria |
| contig_1 | Bdellovibrio bacteriovorus | Bdellovibrio | Bdellovibrionaceae | Bdellovibrionales | Bdellovibrionia | Bdellovibrionota | Bacteria |
| contig_2 | Bdellovibrio bacteriovorus | Bdellovibrio | Bdellovibrionaceae | Bdellovibrionales | Bdellovibrionia | Bdellovibrionota | Bacteria |
| contig_3 | nan | nan | nan | nan | nan | nan | nan |
Mash
- This shows the highest scoring Mash (v2.3) hit against a prepared RefSeq database for each contig in your assembly (database download in section 3.1.4).
- The percent identity is estimated from the number of kmers that matched the reference.
| contig | length (bp) | species | est. %ID |
|---|---|---|---|
| contig_1 | 2,796,631 | NC_005363.1 Bdellovibrio bacteriovorus HD100 | 98.6 |
| contig_2 | 978,107 | NC_005363.1 Bdellovibrio bacteriovorus HD100 | 93.8 |
See the FAQ for more information.
Overview of sequencing reads
- Each dot on the graph represents a single read, plotted by its length and its mean Phred quality score.
- Red dots are reads not used the assembly.
- Blue dots are reads used in the assembly.
- The histograms along the top and right show the distribution of read lengths and Phred scores, respectively.
median phredline shows the median Phred score of all of the reads used in the assembly.
| rN50 | rNG50 |
|---|---|
| 11,315 bp | 60,206 bp |
- These are assembly statistics that help assess read/assembly quality.
- rN50 is the “reads N50”, which is defined as the shortest read that covers 50% of the total number of bases in the reads used for assembly. This is a good reference for better understanding N50, though note that we are reporting rN50 which uses reads instead of contigs.
- rNG50 is the “genome reads N50”. We made this statistic up! However, it’s analogous to a NG50, which takes into account the known or estimated genome size. After assembly, we use the genomic length information to calculate the length where half of the genome is covered by reads of this length or longer.
Assembly completeness
CheckM
- CheckM (v1.2.2) assesses the quality of genome assemblies. It uses a set of reference gene profiles to assess the completeness, contamination, and strain heterogeneity of a given assembly.
- This displays the contigs which the specific CheckM reference genes were found. The reference genes are almost always chromosomal, so if a contig plot shows it contains references genes, it is likely chromosomal.
- Why do my completeness and contamination numbers look slightly off?
| Marker lineage | Completeness | Contamination |
|---|---|---|
| k__Bacteria (UID3187) | 97.75 | 0 |
FAQ
Where are my FASTA files? Where are my annotated DNA sequences?
If your sequencing run succeeded, these files can be found within the accompanying
annotationfolder, which was generated by Bakta. Bioinformatics file extensions can be confusing and somewhat inconsistent. Here is a quick guide to some of file extensions you will find in yourannotationfolder:- FASTA files:
.fna(contig nucleotide sequences).faa(protein amino acid sequences).ffn(gene nucleotide sequences)
- GenBank files:
.gbff(annotated contig sequences)
- FASTA files:
You should be able to open any of these files in your favorite sequence editor (e.g.: Geneious, SnapGene, Benchling, MacVector, CLC, UGENE, etc.).
What is a “contig”?
- This is a contiguous segment of DNA assembled from the sequencing reads. It may or may not represent an entire genome or entire plasmid. Generally, if your assembly graph is solely comprised of circles, each contig represents a single genome or plasmid.
My assembly graph is not a circle – why does it look like a complete mess?
- If your plot is not a circle and has a more complicated path, this could be due to several reasons, including:
- Your DNA sample was degraded and there were not enough longer reads to bridge repetitive regions (investigate your reads to assess). Check out our guide on preparing bacterial DNA samples for sequencing. This is the most likely cause.
- Your genome is incredibly repetitive due to IS elements or other genomic features.
- Your genome was very large and there were not enough reads to satisfactorily cover the genome.
- You had a contaminant organism which reduced the overall genome coverage.
Why does my Mash analysis say that my contig is a plasmid/phage when that is clearly incorrect?
- Since Mash is using the RefSeq database including bacterial genomes, plasmids, and phages, it is possible to have a chromosomal contig labeled as a plasmid or phage if it scores higher than the genome it is contained within.
- This is why we use Sourmash with a different reference database and search strategy – if Mash gives strange results this can be a good second opinion.
- CheckM will also search bacterial and archaeal profiles giving an independent third opinion.
My genome looks good, but why is CheckM reporting only 98% complete and 3% contamination?
- While CheckM is a fantastic tool, it rarely scores genomes as 100% complete with 0% contamination, even when run on “perfect” genomes. Biology is weird, and the supposedly single-copy genes that CheckM looks for may be legitimately missing or duplicated leading to slightly erroneous results.
- Some of the discrepancies may be explained by small errors in the assembly process (see below), but it is more likely that the above is causing the discrepancy.
My contigs are circles and everything looks good – is my genome assembly perfect?
- It’s possible! However, it’s more likely that your genome assembly still contains SNPs (single nucleotide polymorphisms), especially in regions where ONT reads have higher error rates, such as long homopolymers. It’s also possible, though less likely than SNPs, that your genome assembly contains medium-sized structural errors (e.g.: deleting 50 bp from the end of a contig). For the very best possible assembly, we recommend trying your hand at using Trycycler to assemble your genomes and polishing with Illumina reads. Trycycler is a manual pipeline and can be quite labor intensive. While very high quality, the genome assemblies we return should be treated as draft assemblies.
How do I interpret the polished stats file?
The {sample}_polished-stats.txt file shows position-by-position information about how Polypolish processed your assembly. Here’s what the columns mean:
Key Columns
- name: Contig name
- pos: Position in the contig
- base: Original base from the long-read assembly
- depth: Average read depth at this position
- valid: Number of valid short reads covering this position
- pileup: Base composition from short reads (e.g., “Tx258” means 258 T’s)
- status: Polypolish’s decision for this position
- new_base: Final base in the polished assembly
Status Meanings
- kept: Original base was retained (one valid option which matches the assembly base)
- changed: Base was changed based on short read consensus (one valid option which differs from the assembly base)
- too_close: Position too close to another position that was polished (one or more options are neither valid nor invalid)
- multiple: Multiple possible corrections (ambiguous; more than one valid option)
- low_depth: Insufficient short read depth (below
--min_depth) - none: no valid options
Quick Assessment
- Good polishing: Most positions show “kept” status with good depth (>20x)
- Low coverage: Positions with “low_depth” status may need more sequencing
- Corrections made: Look for “corrected” status to see where changes occurred
- Quality issues: Many “multiple” or “too_close” statuses may indicate assembly problems
How is the assembly generated?
- Remove the bottom 5% worst fastq reads via Filtlong v0.2.1 (default parameters)
- Estimate genome size using Autocycler helper functions
- Generate multiple subsampled read sets using Autocycler with optimal coverage for each subsample
- Perform multiple assemblies using Autocycler with three different assemblers:
- Flye v2.9.6+ with parameters optimized for high quality ONT reads
- Hifiasm for generating high-quality assemblies from long reads
- Plassembler v1.8.0+ for detecting and assembling plasmids
- Remove low depth and small contigs and then compress, cluster, trim, resolve, and clean assemblies using Autocycler to identify the best consensus assembly from all assemblers
- Rotate the assembly to start at the optimal position using dnaapler
- Polish the consensus assembly via Medaka v1.8.0 using the filtered reads
How is the polished assembly generated?
- Quality Control: Short reads are processed using fastp with quality filtering (Phred ≥15, length ≥50bp, adapter trimming, poly-G trimming)
- Alignment: Quality-controlled short reads are aligned to the long-read reference assembly using bwa-mem2
- Polishing: The long-read assembly is polished using Polypolish with the aligned short reads
- Uses
--carefulmode when coverage < 25x
- Uses
How is the downstream analysis generated?
Depending on if you request our regular or hybrid service, the latest assembly will be used for our analysis, including
- annotation
- Bakta v1.11
- contig analysis
- Bandage v0.8.1
- genome completeness and contamination
- CheckM v1.2.2
- species / plasmid identification